Performance of a Multi-class Biomedical Tagger on Clinical Records
نویسندگان
چکیده
We tested the performance of Cocoa, an existing dictionary/rule based entity tagger that tags multiple semantic types in biomedical domain including diseases, on disease/sign/symptom detection in clinical records in the ShARe/CLEF eHealth task. Initial analysis showed that the precision was high (≥ 90%), but recall was low (≈ 50%) due to (a) phrases peculiar to clinical notes (b) disambiguation of common words and (c) the large number of undefined acronyms. We extended the system to handle these cases by reference to the local intrasentential context as derived from the training set. A small module was also added for event-based detection of annotated sentence fragments containing verbs/gerunds; an example is ‘LV systolic function appears depressed’. The event detection system had about 30 rules. With these modifications, the f-score was 0.75 on the test set. In a second run, we added about 70 frequently occurring acronyms as well 15 phrases which were all in caps. The final results on the test set (f = 0.78) show that a multi-class tagger can work reasonably well on clinical records.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملEvaluation of the Effect of Presence of Health Information Technology Expert on Medical Records of Patients Admitted to Fatemeh Zahra Hospital, Sari, Iran
Background: Documenting medical records plays an important role in treatment and prevention. The purpose of this study was to evaluate the impact of the presence of health information technology experts in clinical wards on the documentation of hospital admissions files. Methods: In this descriptive cross-sectional study, 96 inpatient records in 2014 and 96 inpatient records in Fatemeh Zahra H...
متن کاملOne Tagger, Many Uses: Illustrating the Power of Ontologies in Dictionary-based Named Entity Recognition
Automatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipel...
متن کاملParts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text
Background: An ongoing assessment of the literature is difficult with the rapidly increasing volume of research publications and limited effective information extraction tools which identify entity relationships from text. A recent study reported development of Muscorian, a generic text processing tool for extracting proteinprotein interactions from text that achieved comparable performance to ...
متن کامل